figure 8
8 SupplementaryMaterial
For the GLOW experiment we stacked three GLOW transformations at different scales eachwitheightaffinecoupling blocks spaced byactnorms andpermutations each parameterized byaCNN with twohidden layers with 512 filters each. In a recent arXiv submission, Arjovsky et al.[2] suggested that in the presence of an observable variability intheenvironmente(e.g. While this procedure workedondistributions that were very similar tobegin with, inthe majority of cases the log-likelihood fit toB did not provide informative gradients when evaluated on the transformed dataset, as the KL-divergence between distributions with disjoint supports is infinite. The code is available in lrmf_gradient_simulation.ipynb. LRMF objective(Eq 2) decreases over time and reaches zero when two datasets are aligned.
ad7ed5d47b9baceb12045a929e7e2f66-Supplemental.pdf
A.1 Costforincentivization We justify the way in which LIO accounts for the cost of incentivization as follows. However, both the reward-giverand recipients require sufficient time tolearn the effect ofincentives,which means that too large anα would lead to the degenerate result ofrηi = 0. On the other extreme, α = 0means there isno penalty and may result inprofligate incentivization that serves no useful purpose. Let θi for i {1,2} denote each agent's probability of taking the cooperative action. Each plot has afixed value for the incentive givenfortheotheraction. Each agent observesallagents' positions andcanmoveamong thethree available states: lever, start, and door.